1 Data Preparation

To create the data set used for this analysis, four data sets were imported and manipulated. One dataset included income, one life expectancy, one population size, and a final dataset with region. All data sets were combined in order to create a final data set with the following variables:

  • Country
  • Income
  • Year
  • Life Expectancy
  • Region
  • Population Size

This final data set will be evaluated in the following sections. Please refer to the following code that was used in order to create this dataset.

income <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/income_per_person.csv")
life <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/life_expectancy_years.csv")
population <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/population_total.csv")
region <- read.csv((file="https://isarenn.github.io/irennenberg/Week5/countries_total.csv"))

##reading in files from github

##reshape income so that there are three columns, country year and income

income2 <- income %>%
  gather(key="Year",
         value="Income",
         - geo,
         na.rm=TRUE)
income3 <- income2 %>%
  mutate(Year=substr(Year,2,5))

names(income3)[1]='Country'

##reshape life expectancy : country, year, life expectancy

life2 <- life %>%
  gather(key="Year",
         value="LifeExp",
         - geo,
         na.rm=TRUE)
life3 <- life2 %>%
  mutate(Year=substr(Year,2,5))

names(life3)[1]='Country'

##reshaping population in the same way. 

pop2 <- population %>%
  gather(key="Year",
         value="Population",
         - geo,
         na.rm=TRUE)
pop3 <- pop2 %>%
  mutate(Year=substr(Year,2,5))

names(pop3)[1]='Country'

##Merge/join the above life and income sets to create one new dataset 'LifeExpIncom' with variables country,year,lifeexp and income

LifeExpIncom <- merge(income3, life3, by=c("Year","Country"))

## Merge the above dataset with the region dataset, so that the resulting dataset has income, lifeexp, population size and country region

names(region)[1]='Country'
region2 <- subset(region, select=-c(alpha.2, alpha.3, country.code, iso_3166.2, sub.region, intermediate.region, region.code, sub.region.code, intermediate.region.code))

LifeExpIncom2 <- merge(LifeExpIncom, region2, by="Country")

##Merge the above dataset with population so that there is income, lifeexp, population size country region and population size

LifeExpIncom3 <- merge(LifeExpIncom2, pop3, by=c("Year","Country"))

write.csv(LifeExpIncom3, "LifeExpIncomFinal.csv")

2 Data Set Description

The final data set has 37,590 observations and 6 variables. The data sets were retrieved from the course project data repository under the World Life Expectancy data section. The data was collected from a public domain.

3 Data For Year 2000 and 2015

In order to proceed with the data set, two datasets were created from the following years: 2000 and 2015. The data was subset using the below code.

d2000data <- LifeExpIncom3 %>%
  filter(Year=='2000')

d2015data <- LifeExpIncom3 %>%
  filter(Year=='2015')

4 Usage of ggplot Scatterplot

A scatter plot of Income and Life Expectancy was created for the year 2000.

The below data plot shows that life expectancy is very low the close the income gets to zero. As income increases, as does life expectancy. Life expectancy increases very sharply from 0 to ~1500, where afterwards it stabilizes as income increases.

d2000.income <- d2000data$Income
d2000.lifeexp <- d2000data$LifeExp
d2000.country <- d2000data$Country
d2000.region <- d2000data$region
d2000.pop <- d2000data$Population

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp)) +
  geom_point()+
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 

5 Manipulating the ggplot to Include the Third Numeric Variable

Now, the point size will be manipulated to display the population size for the same year.

This dataset now shows bigger points as being higher populations. It is seen that higher populations tend to be lower on the spectrum in terms of income. The very high income areas show points that have a smaller size, and therefore smaller population.

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp, size=d2000.pop)) +
  geom_point()+
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 size="Population Size",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 

6 Manipulating the ggplot to Include the Categorical Region Variable

Now, the point color will be manipulated to show the 5 different region categories for the same year. The categories are as follows:

  • Africa
  • Americas
  • Asia
  • Europe
  • Oceania

This final scatter plot shows us some distinct sections for select regions. The lower region, or Africa, tends to be localized near the lower life expectancy and lower income. The Asia region spans the largest income range, from being localized above the Africa cluster, to all the way to the right (or the highest income group). Within the Asia group, Oceania, Americas and Europe reside. Not all regions are isolated to one area of the scatter plot. There are certain trends for areas in which the regions reside, but there are some points that exceed that area. For example, there are a few Africa region points that are among the Americas/Europe/Asia points.

It is important to note that the frequency for the Oceania region is much lower than the other regions.

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp, size=d2000.pop, color=d2000.region)) +
  geom_point()+
  scale_color_manual(values=c("#FFD700", "#EA5F94","#0000FF","#84FFA9","#FF8F00")) +
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 size="Population Size",
                 color="Region",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 

7 Interactive Plot for Year 2015

An interactive plot was created for the 2015 dataset. The interactive scatter plot was made to display the association between life expectancy and income.

Since population size was so large, it was sized by dividing the population by: **~(2*log(data.pop)-11)^2.** .

d2015.income <- d2015data$Income
d2015.lifeexp <- d2015data$LifeExp
d2015.country <- d2015data$Country
d2015.region <- d2015data$region
d2015.pop <- d2015data$Population
d2015.pop2 <- d2015.pop/10000000
pal <- c("#ffbe0b", "#fb5607", "#ff006e","#8338ec","#3a86ff")

plot_ly(
    data = d2015data,
    x = ~d2015.income,  # Horizontal axis 
    y = ~d2015.lifeexp,   # Vertical axis 
    color = ~factor(d2015.region),  # must be a numeric factor
    colors=pal,
    text = ~paste("Population Size: ", d2015.pop,
                   "<br>Region:", d2015.region,
                   "<br>Country:", d2015.country), 
     # Show the species in the hover text
     ## using the following hovertemplate() to add the information of the
     ## Two numerical variables to the hover text.
     ### Use the following hover template to display more information
     hovertemplate = paste('<i><b>Life Expectancy<b></i>: %{y}',
                           '<br><b>Income</b>:  %{x}',
                           '<br><b>%{text}</b>'),
     alpha  = 0.6,
     marker = list(size = ~~(2*log(d2015.pop)-11)^2, sizeref = .05, sizemode = 'area' ),
     type = "scatter",
     mode = "markers",
     size.mode='area',
     ## graphic size
     width = 700,
     height = 350
   ) %>%
    layout(  
      ### Title 
      title =list(text = "Income vs Life Expectancy for the Year 2015", 
                  font = list(family = "Times New Roman",  # HTML font family  
                                size = 18,
                               color = "#ff006e")), 
      ### legend
      legend = list(title = list(text = 'Region',
                                 font = list(family = "Times New Roman",
                                               size = 14,
                                              color = "black")),
                    bgcolor = "#f7f7f7",
                    bordercolor = "black",
                    groupclick = "togglegroup",  # one of  "toggleitem" AND "togglegroup".
                    orientation = "v"  # Sets the orientation of the legend.
                    ),
      ## Backgrounds
      plot_bgcolor ='#f7f7f7', 
      ## Axes labels
             xaxis = list( 
                    title=list(text = 'Income',
                               font = list(family = 'Times New Roman')),
                    zerolinecolor = 'gray', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Life Expectancy',
                               font = list(family = 'Times New Roman')),
                    zerolinecolor = 'purple', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'),
       ## annotations
       annotations = list(  
                     x = 0.7,   # between 0 and 1. 0 = left, 1 = right
                     y = 1.5,   # between 0 and 1, 0 = bottom, 1 = top
                  font = list(size = 12,
                              color = "#ff006e"),   
                  text = "The point size is proportional to population size.",   
                  xref = "paper",  # "container" spans the entire `width` of the 
                                   #  lot. "paper" refers to the width of the 
                                   #  plotting area only. yref = "paper",  
                                   #  same as xref.
               xanchor = "center", #  horizontal alignment with respect to its x position
               yanchor = "bottom", #  similar to xanchor  
             showarrow = FALSE)
    )

8 Animated Plot

This animated plot aids in looking at the trends over time for the association between income and life expectancy for the entirety of the dataset.

data.income <- LifeExpIncom3$Income
data.lifeexp <- LifeExpIncom3$LifeExp
data.country <- LifeExpIncom3$Country
data.region <- LifeExpIncom3$region
data.pop <- LifeExpIncom3$Population
data.year <- LifeExpIncom3$Year
pal.IBM <- c("#ffbe0b", "#fb5607", "#ff006e","#8338ec","#3a86ff")
pal.IBM <- setNames(pal.IBM, c("Asia", "Europe", "Africa", "Americas", "Oceania"))

df <- LifeExpIncom3 
fig <- df %>%
  plot_ly(
    x = ~data.income, 
    y = ~data.lifeexp, 
    size = ~(2*log(data.pop)-11)^2,
    color = ~data.region, 
    colors = pal.IBM,   # custom colors
    #marker = list(size = ~(log(pop)-10),  sizemode = 'area'),
    frame = ~data.year,      # the time variable to
    # to display in the hover
    text = ~paste("Country:", data.country,
                  "<br>Continent:", data.region,
                  "<br>Year:", data.year,
                  "<br>Income:", data.income,
                  "<br>LifeExp:", data.lifeexp,
                  "<br>Population:", data.pop),
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) %>%
  layout(
    title=list(text="Income vs Life Expectancy from 1800-2018",
               font=list(family="Time New Roman",
                         size=18,
                         color="#ff006e")),
    xaxis = list( 
                    title=list(text = 'Income',
                               font = list(family = 'Times New Roman')),
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Life Expectnacy',
                               font = list(family = 'Times New Roman')),
                    gridcolor = 'white')
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    )
  )

fig

9 Final Conlusions

The original data sets were manipulated to create one large dataset with the following variables: country, income, year, life expectancy, region and population size.

As seen with the animated scatter plot (previous section), income and life expectancy increased over time from the years 1800-2018 (the entire span of the data sets). There were several instances in which there were drastic drops in the life expectancy, around the time of World Wars. There is a relationship throughout time between life expectancy and income.

For the three graphs created with the data from 2000, it is evident that there is a relationship between income and life expectancy. Points of high income and life expectancy were all smaller in size, indicating a lower population, with the exception of a point for the Americas in the final graph. It was found that when changing the color for region, it was most evident that the Africa region was densely close at the lower income and lower life expectancy area. The other four regions- Oceania, Europe, Asia and Americas, were (for the most part) higher than the Africa region for life expectancy and income. There were some Africa points on the graph that were among the other regions. The Asia region was second to Africa in that there seemed to be grouping of points in the mid-range of the graph.

The graph showed that from the 0 to ~1500 income range, there was a sharp increase in life expectancy. Following ~30,000 (income) the life expectancy remains stable. Life expectancy was the lowest when income was incredibly close to 0, and increased exponentially the farther away from zero until the stabilization occurred.

For the interactive plot from the year 2015, there are similar relationships in the graph. Notably, the point with the highest income comes from Qatar, and the lowest income (and life expectancy) comes from the Central African Republic. Similarly to the 2000 year scatter plot, there is a sharp increase earlier in the plot, and then a leveling out, around 20,000.

Thank you and good night!

---
title: "Country Comparisons"
author: "Isabelle Rennenberg"
output: 
  html_document: 
    code_folding: hide
    toc: true
    toc_float: true
    toc_collapsed: true
    toc_depth: 4
    theme: "spacelab"
    number_sections: true
    code_download: true
      
    
---  

<style type="text/css">

h1.title {
  font-size: 38px;
  color: #0000FF;
  text-align: center;
}
h4.author { /* Header 4 - and the author and data headers use this too  */
    font-size: 18px;
  font-family: "Times New Roman", Times, serif;
  color: #41b6c4;
  text-align: center;
}
h1 {
    font-size: 22px;
    font-family: "Times New Roman", Times, serif;
    color: #41b6c4;
    text-align: left;
}
</style>


```{r setup, include=FALSE}
options(repos = list(CRAN="http://cran.rstudio.com/"))
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("cowplot")) {
   install.packages("cowplot")
   library(cowplot)
}
if (!require("latex2exp")) {
   install.packages("latex2exp")
   library(latex2exp)
}
if (!require("plotly")) {
   install.packages("plotly")
   library(plotly)
}
if (!require("gapminder")) {
   install.packages("gapminder")
   library(gapminder)
}
if (!require("png")) {
    install.packages("png")    
    library("png")
}
if (!require("RCurl")) {
    install.packages("RCurl")    
    library("RCurl")
}
if (!require("colourpicker")) {
    install.packages("colourpicker")              
    library("colourpicker")
}
if (!require("gganimate")) {
    install.packages("gganimate")              
    library("gganimate")
}
if (!require("gifski")) {
    install.packages("gifski")              
    library("gifski")
}
if (!require("magick")) {
    install.packages("magick")              
    library("magick")
}
if (!require("grDevices")) {
    install.packages("grDevices")              
    library("grDevices")
}
if (!require("jpeg")) {
    install.packages("jpeg")              
    library("jpeg")
}
if (!require("ggridges")) {
    install.packages("ggridges")              
    library("ggridges")
}
if (!require("plyr")) {
    install.packages("plyr")              
    library("plyr")
}
if (!require("ggiraph")) {
    install.packages("ggiraph")              
    library("ggiraph")
}
if (!require("highcharter")) {
    install.packages("highcharter")              
    library("highcharter")
}
if (!require("forecast")) {
    install.packages("forecast")              
    library("forecast")
}
## 
knitr::opts_chunk$set(echo = TRUE,       
                      warning = FALSE,   
                      result = TRUE,   
                      message = FALSE,
                      comment = NA)
```
# Data Preparation

To create the data set used for this analysis, four data sets were imported and manipulated. One dataset included income, one life expectancy, one population size, and a final dataset with region. All data sets were combined in order to create a final data set with the following variables: 

* **Country**
* **Income**
* **Year**
* **Life Expectancy**
* **Region**
* **Population Size**

This final data set will be evaluated in the following sections. Please refer to the following code that was used in order to create this dataset. 

```{r}

income <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/income_per_person.csv")
life <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/life_expectancy_years.csv")
population <- read.csv(file="https://isarenn.github.io/irennenberg/Week5/population_total.csv")
region <- read.csv((file="https://isarenn.github.io/irennenberg/Week5/countries_total.csv"))

##reading in files from github

##reshape income so that there are three columns, country year and income

income2 <- income %>%
  gather(key="Year",
         value="Income",
         - geo,
         na.rm=TRUE)
income3 <- income2 %>%
  mutate(Year=substr(Year,2,5))

names(income3)[1]='Country'

##reshape life expectancy : country, year, life expectancy

life2 <- life %>%
  gather(key="Year",
         value="LifeExp",
         - geo,
         na.rm=TRUE)
life3 <- life2 %>%
  mutate(Year=substr(Year,2,5))

names(life3)[1]='Country'

##reshaping population in the same way. 

pop2 <- population %>%
  gather(key="Year",
         value="Population",
         - geo,
         na.rm=TRUE)
pop3 <- pop2 %>%
  mutate(Year=substr(Year,2,5))

names(pop3)[1]='Country'

##Merge/join the above life and income sets to create one new dataset 'LifeExpIncom' with variables country,year,lifeexp and income

LifeExpIncom <- merge(income3, life3, by=c("Year","Country"))

## Merge the above dataset with the region dataset, so that the resulting dataset has income, lifeexp, population size and country region

names(region)[1]='Country'
region2 <- subset(region, select=-c(alpha.2, alpha.3, country.code, iso_3166.2, sub.region, intermediate.region, region.code, sub.region.code, intermediate.region.code))

LifeExpIncom2 <- merge(LifeExpIncom, region2, by="Country")

##Merge the above dataset with population so that there is income, lifeexp, population size country region and population size

LifeExpIncom3 <- merge(LifeExpIncom2, pop3, by=c("Year","Country"))

write.csv(LifeExpIncom3, "LifeExpIncomFinal.csv")

```

# Data Set Description

The final data set has 37,590 observations and 6 variables. The data sets were retrieved from the course project data repository under the World Life Expectancy data section. The data was collected from a public domain. 

# Data For Year 2000 and 2015

In order to proceed with the data set, two datasets were created from the following years: 2000 and 2015. The data was subset using the below code. 

```{r}

d2000data <- LifeExpIncom3 %>%
  filter(Year=='2000')

d2015data <- LifeExpIncom3 %>%
  filter(Year=='2015')

```

# Usage of ggplot Scatterplot

A scatter plot of Income and Life Expectancy was created for the year 2000. 

The below data plot shows that life expectancy is very low the close the income gets to zero. As income increases, as does life expectancy. Life expectancy increases very sharply from 0 to ~1500, where afterwards it stabilizes as income increases. 
```{r, fig.align='center'}
d2000.income <- d2000data$Income
d2000.lifeexp <- d2000data$LifeExp
d2000.country <- d2000data$Country
d2000.region <- d2000data$region
d2000.pop <- d2000data$Population

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp)) +
  geom_point()+
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 
```

# Manipulating the ggplot to Include the Third Numeric Variable

Now, the point size will be manipulated to display the population size for the same year.

This dataset now shows bigger points as being higher populations. It is seen that higher populations tend to be lower on the spectrum in terms of income. The very high income areas show points that have a smaller size, and therefore smaller population. 

```{r, fig.align='center'}

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp, size=d2000.pop)) +
  geom_point()+
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 size="Population Size",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 

```

# Manipulating the ggplot to Include the Categorical Region Variable

Now, the point color will be manipulated to show the 5 different region categories for the same year. The categories are as follows: 

* **Africa**
* **Americas**
* **Asia**
* **Europe**
* **Oceania** 

This final scatter plot shows us some distinct sections for select regions. The lower region, or Africa, tends to be localized near the lower life expectancy and lower income. The Asia region spans the largest income range, from being localized above the Africa cluster, to all the way to the right (or the highest income group). Within the Asia group, Oceania, Americas and Europe reside. Not all regions are isolated to one area of the scatter plot. There are certain trends for areas in which the regions reside, but there are some points that exceed that area. For example, there are a few Africa region points that are among the Americas/Europe/Asia points. 

It is important to note that the frequency for the Oceania region is much lower than the other regions. 

```{r, fig.align='center'}

ggplot(data = d2000data, mapping = aes(x = d2000.income, y = d2000.lifeexp, size=d2000.pop, color=d2000.region)) +
  geom_point()+
  scale_color_manual(values=c("#FFD700", "#EA5F94","#0000FF","#84FFA9","#FF8F00")) +
  labs(
                 x = "Income",
                 y = "Life Expectancy",
                 size="Population Size",
                 color="Region",
                 title = "Association between Income and Life Expectancy",
                 subtitle = "Data for the Year 2000",
                 caption = paste("Created on", Sys.Date())) +
             theme_gray() 

```

# Interactive Plot for Year 2015

An interactive plot was created for the 2015 dataset. The interactive scatter plot was made to display the association between life expectancy and income. 

Since population size was so large, it was sized by dividing the population by: **~(2*log(data.pop)-11)^2.** .

```{r, fig.align='center', fig.width=5, fig.height=6}
d2015.income <- d2015data$Income
d2015.lifeexp <- d2015data$LifeExp
d2015.country <- d2015data$Country
d2015.region <- d2015data$region
d2015.pop <- d2015data$Population
d2015.pop2 <- d2015.pop/10000000
pal <- c("#ffbe0b", "#fb5607", "#ff006e","#8338ec","#3a86ff")

plot_ly(
    data = d2015data,
    x = ~d2015.income,  # Horizontal axis 
    y = ~d2015.lifeexp,   # Vertical axis 
    color = ~factor(d2015.region),  # must be a numeric factor
    colors=pal,
    text = ~paste("Population Size: ", d2015.pop,
                   "<br>Region:", d2015.region,
                   "<br>Country:", d2015.country), 
     # Show the species in the hover text
     ## using the following hovertemplate() to add the information of the
     ## Two numerical variables to the hover text.
     ### Use the following hover template to display more information
     hovertemplate = paste('<i><b>Life Expectancy<b></i>: %{y}',
                           '<br><b>Income</b>:  %{x}',
                           '<br><b>%{text}</b>'),
     alpha  = 0.6,
     marker = list(size = ~~(2*log(d2015.pop)-11)^2, sizeref = .05, sizemode = 'area' ),
     type = "scatter",
     mode = "markers",
     size.mode='area',
     ## graphic size
     width = 700,
     height = 350
   ) %>%
    layout(  
      ### Title 
      title =list(text = "Income vs Life Expectancy for the Year 2015", 
                  font = list(family = "Times New Roman",  # HTML font family  
                                size = 18,
                               color = "#ff006e")), 
      ### legend
      legend = list(title = list(text = 'Region',
                                 font = list(family = "Times New Roman",
                                               size = 14,
                                              color = "black")),
                    bgcolor = "#f7f7f7",
                    bordercolor = "black",
                    groupclick = "togglegroup",  # one of  "toggleitem" AND "togglegroup".
                    orientation = "v"  # Sets the orientation of the legend.
                    ),
      ## Backgrounds
      plot_bgcolor ='#f7f7f7', 
      ## Axes labels
             xaxis = list( 
                    title=list(text = 'Income',
                               font = list(family = 'Times New Roman')),
                    zerolinecolor = 'gray', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Life Expectancy',
                               font = list(family = 'Times New Roman')),
                    zerolinecolor = 'purple', 
                    zerolinewidth = 2, 
                    gridcolor = 'white'),
       ## annotations
       annotations = list(  
                     x = 0.7,   # between 0 and 1. 0 = left, 1 = right
                     y = 1.5,   # between 0 and 1, 0 = bottom, 1 = top
                  font = list(size = 12,
                              color = "#ff006e"),   
                  text = "The point size is proportional to population size.",   
                  xref = "paper",  # "container" spans the entire `width` of the 
                                   #  lot. "paper" refers to the width of the 
                                   #  plotting area only. yref = "paper",  
                                   #  same as xref.
               xanchor = "center", #  horizontal alignment with respect to its x position
               yanchor = "bottom", #  similar to xanchor  
             showarrow = FALSE)
    )
```

# Animated Plot

This animated plot aids in looking at the trends over time for the association between income and life expectancy for the entirety of the dataset. 

```{r}

data.income <- LifeExpIncom3$Income
data.lifeexp <- LifeExpIncom3$LifeExp
data.country <- LifeExpIncom3$Country
data.region <- LifeExpIncom3$region
data.pop <- LifeExpIncom3$Population
data.year <- LifeExpIncom3$Year
pal.IBM <- c("#ffbe0b", "#fb5607", "#ff006e","#8338ec","#3a86ff")
pal.IBM <- setNames(pal.IBM, c("Asia", "Europe", "Africa", "Americas", "Oceania"))

df <- LifeExpIncom3 
fig <- df %>%
  plot_ly(
    x = ~data.income, 
    y = ~data.lifeexp, 
    size = ~(2*log(data.pop)-11)^2,
    color = ~data.region, 
    colors = pal.IBM,   # custom colors
    #marker = list(size = ~(log(pop)-10),  sizemode = 'area'),
    frame = ~data.year,      # the time variable to
    # to display in the hover
    text = ~paste("Country:", data.country,
                  "<br>Continent:", data.region,
                  "<br>Year:", data.year,
                  "<br>Income:", data.income,
                  "<br>LifeExp:", data.lifeexp,
                  "<br>Population:", data.pop),
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) %>%
  layout(
    title=list(text="Income vs Life Expectancy from 1800-2018",
               font=list(family="Time New Roman",
                         size=18,
                         color="#ff006e")),
    xaxis = list( 
                    title=list(text = 'Income',
                               font = list(family = 'Times New Roman')),
                    gridcolor = 'white'), 
            yaxis = list( 
                    title=list(text = 'Life Expectnacy',
                               font = list(family = 'Times New Roman')),
                    gridcolor = 'white')
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    )
  )

fig
```

# Final Conlusions 

The original data sets were manipulated to create one large dataset with the following variables: country, income, year, life expectancy, region and population size. 

As seen with the animated scatter plot (previous section), income and life expectancy increased over time from the years 1800-2018 (the entire span of the data sets). There were several instances in which there were drastic drops in the life expectancy, around the time of World Wars. There is a relationship throughout time between life expectancy and income.  

For the three graphs created with the data from 2000, it is evident that there is a relationship between income and life expectancy. Points of high income and life expectancy were all smaller in size, indicating a lower population, with the exception of a point for the Americas in the final graph. It was found that when changing the color for region, it was most evident that the Africa region was densely close at the lower income and lower life expectancy area. The other four regions- Oceania, Europe, Asia and Americas, were (for the most part) higher than the Africa region for life expectancy and income. There were some Africa points on the graph that were among the other regions. The Asia region was second to Africa in that there seemed to be grouping of points in the mid-range of the graph. 

The graph showed that from the 0 to ~1500 income range, there was a sharp increase in life expectancy. Following ~30,000 (income) the life expectancy remains stable. Life expectancy was the lowest when income was incredibly close to 0, and increased exponentially the farther away from zero until the stabilization occurred. 

For the interactive plot from the year 2015, there are similar relationships in the graph. Notably, the point with the highest income comes from Qatar, and the lowest income (and life expectancy) comes from the Central African Republic. Similarly to the 2000 year scatter plot, there is a sharp increase earlier in the plot, and then a leveling out, around 20,000. 

Thank you and good night! 